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Abstract 

We investigate the distribution of fiavonoids, a major category of plant secondary- 
metabolites, across species. Fiavonoids are known to show high species specificity, 
and were once considered as chemical markers for understanding adaptive evolution 
and characterization of living organisms. We investigate the distribution among 
species using bipartite networks, and find that two heterogeneous distributions are 
conserved among several families: the power-law distributions of the number of 
fiavonoids in a species and the number of shared species of a particular flavonoid. In 
order to explain the possible origin of the heterogeneity, we propose a simple model 
with, essentially, a single parameter. As a result, we show that two respective power- 
law statistics emerge from simple evolutionary mechanisms based on a multiplicative 
process. These findings provide insights into the evolution of metabolite diversity 
and characterization of living organisms that defy genome sequence analysis for 
different reasons. 
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1 Introduction 



Living organisms produce compounds of many types via their metabolisms 
which are believed to adaptively shape-shift with changing environment across 
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a long evolutionary history. Elucidation of design principles behind such com- 
plex systems is a major goal in natural science. Toward this end, so far, the 
structure of metabolic networks has been actively investigated using network 
analysis from the viewpoint of statistical mechanics. As a result, striking struc- 
tural properties such as scale-free (heterogeneous) connectivity and hierarchi- 
cal organization have been revealed, and possible origins have been discussed 
via several models (e.g., reviewed in Refs. (0; 0; S) ) • In addition to considering 
metabolic networks, however, it is also important to consider how metabo- 
lites are distributed among species in order to elucidate design principles 
of metabolisms such as adaptive mechanisms. The metabolite distributions 
have the following advantages. Since living organisms have specific metabo- 
lite compositions due to metabolisms adaptively changing with respect to the 
environment, we can estimate environmental adaptation (adaptive evolution) 
using metabolite distributions. Moreover, they are also useful for character- 
izing species relationships, which are highly linked to ecological systems. In 
metabolite distributions, thus, identification of structures and construction of 
a theory (model) for evolutionary mechanisms are key challenges for a deeper 
understanding of metabolism. 

Flavonoids are especially interesting examples when considering metabolite 
distributions among species. Secondary metabolites including flavonoids, alka- 
noids, terpenoids, phenolics, and other compounds are widely observed in an- 
giosperms, and are not essential for preserving life unlike basic metabolites 
such as bases, amino acids, sugars, and fatty acids (building blocks of DNA, 
protein, carbohydrate, and fat, respectively). However, secondary metabolites 
play additional roles aiding survival in diverse environments. Therefore, dis- 
tributions of secondary metabolites are believed to be significantly different 
amongspecies due to adaptation to environments, implying high species speci- 
ficity (j4|). For this reason, secondary metabolites help us to understand envi- 
ronmental adaptation and adaptive evolution. Moreover, secondary metabo- 
lites, especially flavonoids, are often used as markers in chemotaxonomy, which 
is a taxonomic classification based on metabolite compositions of species that 
has been used for many years (ji|). However, taxonomic classifications using sec- 
ondary metabolites at higher levels (e.g. family and order levels) are known 
to be inherently more difficult than those at lower levels (e.g. species levels) 

Although the metabolite distribution provides important insights into metabolism 
as discussed above, it has not caught as much attention as metabolic networks. 
This was mainly because knowledge of secondary metabolites was not widely 
available. In recent years, however, the whole picture of species-flavonoid rela- 
tionships has become available in the KNApSAcK database (fol) and partly in 
Metabolomics.JP (0). We now can investigate metabolite distributions among 
species using these websites. 
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In this paper, we focus on flavonoids, which are a class of secondary metabo- 
lites, and investigate metabolite distribution among species. In order to com- 
prehensibly describe species-flavonoid relationships, bipartite networks are uti- 
lized. They are useful for representing two different objects, which correspond 
to species and flavonoids in this case. We first investigate degree distributions 
in species-flavonoid networks in several families, and show power-law distri- 
butions of the number of flavonoids in a species and the number of shared 
species of a flavonoid. A simple model is next proposed for explaining a possi- 
ble origin of the heterogeneous distributions (power-law distributions), and it 
is compared to real data. Furthermore, intuitive descriptions and mathemat- 
ical evidence are provided for the emergence of heterogeneous distributions. 
We finally mention the characteristics of well-shared (hub) isoflavonoids in 
Fabaceae (bean family) as an example. In addition, we discuss the possibility 
of more effective selection of discriminative metabolites and taxonomic clas- 
sifications at higher levels by considering this heterogeneous distribution, and 
speculate on the evolution of flavonoid diversity. 



2 Methods 



A total of 14378 species-flavonoid pairs were downloaded from Metabolomics. JP 
(0) ( |http: / / metabolomics.jp/ wiki / Category :FL ) , in which 4725 species and 



6846 identified flavonoid structures are linked by a published journal article. 
In other words, only published data were utilized. The species-flavonoid pairs 
are by no means comprehensive: no plant species has been 'completely' in- 
vestigated for its biosynthetic activity, and many flavonoid molecules whose 
descriptions have yet to be published are also thought to exist. 

The taxonomy (family) of a species was assigned according to The Taxo- 



nomicon (http://taxonomicon.taxonomy.nl). We here focus on the six largest 



families in terms of the number of reported flavonoids: Fabaceae (bean family), 
Asteraceae (composite family), Lamiaceae (Japanese basil family), Rutaceae 
(citrus family), Moraceae (mulberry family), and Rosaceae (rose family). In 
particular, we can discuss in detail species-flavonoid relationships in Fabaceae 
because this family is well researched. 

Flavonoids consist of backbone structures and their modifications. On the 
above website, flavonoids are classified into nine groups according to their 
backbone structures: FL1: Chalcone, FL2: Flavanone, FL3: Flavone, FL4: Di- 
hydroflavonol, FL5: Flavonol, FL6: Flavan, FL7: Anthocyanin, FLI: Isoflavonoid, 
FLN: Neoflavonoid. 

In order to comprehensibly investigate the distribution of flavonoids (metabo- 
lites) among species, we here utilize bipartite networks (hereinafter called 
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species- flavonoid networks), which is useful for representing two different ob- 
jects (species and fiavonoids in this case). Bipartite networks are defined 
as graphs having two different node sets (species and fiavonoids) in which 
edges are only drawn between one node set and the other node set (inter- 
connectivity). Note that there is no edge between nodes belonging to the 
same node set (intra-connectivity) . In the species-flavonoid networks, an edge 
is drawn between a species node and a flavonoid node when the species has 
the flavonoid. 



3 Results and discussion 

3.1 Heterogeneous distribution of fiavonoids 

We show a partial species-flavonoid network for Lamiaceae (the Japanese basil 
family) in Fig. [1] as an example. The node degrees of species nodes (squares) 
and flavonoid nodes (circles) are extremely varied. 

In order to characterize the tendency of connectivity, we investigated frequency 
distributions of the node degree (degree distribution) in species-flavonoid net- 
works. In bipartite networks, we can find distributions of two types due to the 
two types of nodes (species nodes and flavonoid nodes). The number of edges 
for species nodes and flavonoid nodes correspond to the number of fiavonoids 
in a species nf and the number of shared species of a particular flavonoid n s , 
respectively. 

As shown in Fig. [2], the frequency distributions of n/ and n s roughly fol- 
low a power law, implying heterogeneous distribution of fiavonoids among 
species. That is, most fiavonoids are shared by a few species; however, a few 
fiavonoids are conserved in many species. Most species have fiavonoids of a 
few types; however, a few species have fiavonoids of many types. Further- 
more, the heterogeneous distributions of fiavonoids among species character- 
ized by the power-law statistics are approximately conserved between family- 
based species-flavonoid networks, suggesting a scale-free feature. Regarding 
the number of metabolites in a species following power-law distributions a 
similar result has been additionally reported in (J6|). 

This finding might provide insights into environmental adaptation and adap- 
tive evolution because compositions of secondary metabolites including fiavonoids 
are strongly influenced by environmental conditions. By considering heteroge- 
neous distribution of fiavonoids, we might be able to detect useful metabolites 
relating to such adaptations. In particular, the heterogeneous distribution of 
the number of shared species predicts that most fiavonoids are important for 
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• FL1: Chalcone 
9 FL2: Flavanone 

• FL3: Flavone 
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Fig. 1. A partial species-flavonoid network for Lamiaceae (the Japanese basil fam- 
ily) drawn by yEd (@). The squares and circles correspond to plant species and 
flavonoids, respectively. The color of circles indicates the flavonoid class. 

characterizing such adaptations at a species level. 

However, this finding also tells us of the difficulty in finding metabolites that 
reflect adaptations at higher levels (e.g. family and order levels). In general, 
taxonomic classification (characterization of living organisms) based on sec- 
ondary metabolites including flavonoids is empirically believed to be more 
difficult at higher levels than at lower levels (e.g. species level) (jH). This dif- 
ficulty, driving from the heterogeneous distributions is that most flavonoids 
are species-specific. Thus, most flavonoids contribute to taxonomic classifica- 
tion at lower levels rather than that at higher levels. As a result, we possibly 
extract overrated and/or underrated information when exhaustively consid- 
ering flavonoids (metabolites) at higher levels. Hub flavonoids might play an 
essential role in characterizing biological features (e.g. environmental adapta- 
tion) at higher levels because of their conservation; thus, hub flavonoids are 
expected to be more appropriate and beneficial markers for characterization 
of biological features at higher levels. 

3.2 A possible origin of heterogeneous distributions 

We here speculate on a possible origin of heterogeneous distributions of flavonoids 
(metabolites) among species using a simple model. We believe that the hetero- 
geneous distribution was acquired through evolutionary history. The origin of 
heterogeneity might provide insights into the evolution of flavonoid diversity. 
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Fig. 2. Degree distributions of the species-fiavonoid networks. (A) The frequency 
distribution of the number of flavonoids in a plant species. (B) The frequency dis- 
tribution of the number of shared species of a flavonoid. The symbols indicate the 
distributions for family-based species-fiavonoid networks. The solid lines represent 
the distributions for all-encompassing species-fiavonoid networks. 

3.2.1 Model 



We consider two simple evolutionary mechanisms as follows, (i) New flavonoids 
are generated by variation of existing flavonoids. In evolutionary history, species 
accordingly obtain new metabolic enzymes via gene duplications (EJ) and hori- 
zontal gene transfers (lid ), and the metabolic enzymes synthesize new flavonoids 
through modification of existing flavonoids with substituent groups and func- 
tional groups, (ii) Flavonoid compositions of new species are inherited from 
those of existing (ancestral) species. New species are believed to emerge by 
mutation of ancestral species, and they are similar to the ancestral species as 
a result. For this reason, there might be the above inheritance of flavonoid 
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compositions from ancestral species to new species. 

With consideration for the above two mechanisms, we propose a simple model 
with two parameters p and q reproducing heterogeneous distributions of flavonoids 
among species. 

Our model is defined by the following procedure: 

(a) We set an initial species-flavonoid network represented as a complete bi- 
partite graph with no species and no flavonoids (Fig. [3] A). 

(b) With the probability p, Event I corresponding to the emergence of a new 
species occurs. An existing species is selected at random (Fig. [3]B). A new 
species emerges due to mutation of the randomly selected existing species, 
and the flavonoids of the existing species are inherited by the new species as 
their candidate flavonoids (Fig. 0C). On considering divergence of flavonoid 
compositions, the new species finally acquires flavonoids with equal probability 
q for each of the candidates (Fig. [3]D). However, if new species accordingly 
have no flavonoids, then such species are neglected (removed) because of the 
observation condition (species without flavonoids are not included in our data 
set). In contrast to Event I, Event II corresponding to the emergence of a 
new flavonoid occurs with the probability 1 — p. A species-flavonoid pair is 
uniformly selected at random (Fig. [3] E). Then, the species receives a new 
flavonoid (Fig. El F). 

(c) The procedure (b) is repeated until the number of species and the number 
of flavonoids are equivalent to S and F, respectively. 



This model does not consider the loss of nodes (extinctions), which is an 
important mechanism in evolutionary systems. In particular, the degree dis- 
tributions may become different due to such extinctions (11; Qjl). In species- 
flavonoid networks, however, this mechanism tends to be nonessential for the 
following reason. In plant evolution, genome doubling (polyploidity) is a major 
driving force for increasing genome size and the number of genes (1131). Dupli- 
cated genes typically diversify in their function, and some acquire the ability 
to synthesize new compounds. Indeed, more than 50,000 molecular structures 
are elucidated in the entire plant kingdom (mostly secondary metabolites) 
(|14l ). compared to a few thousand primary metabolites in higher animals. The 
population of flavonoids, the representative group in plant secondary metabo- 
lites, is therefore expected to increase, indicating that we can dismiss the effect 
of node losses. 



Our model is essentially adjusted through only one parameter q because the 
parameter p only controls the number of species and the number of flavonoids. 
When we set the number of species S and the number of flavonoids (metabo- 
lites) F, the parameter p can be estimated as S/ (S + F). 
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Fig. 3. Schematic diagram of the model. Squares and circles mean plant species 
and flavonoids, respectively. (A) An initial species- flavonoid relationship (network) 
with tiq = 2. ((B) - (D)) Event I: the emergence of a new species. The gray square 
represents a randomly selected existing species. The black square indicates a new 
species emerging due to duplication of existing species. The dashed lines are possible 
pairs of the new species and flavonoids. ((E) - (F)) Event II: the emergence of a new 
flavonoid. The thick edge between gray nodes corresponds to a randomly selected 
existing species-flavonoid pair. The black circle is a new flavonoid. 

3.2.2 Relation with 'rich- get-richer' mechanisms 

The emergence of heterogeneous (power-law) distributions in evolving systems 
might be caused by 'rich-get-richer' or preferential mechanisms: the increase 
of a statistic is proportional to the statistic itself (j2; 15). We here explain that 
the model has 'rich-get-richer' mechanisms. 

We first mention the number of flavonoids in a species rif. When Event II oc- 
curs, nf increases. The number of flavonoids of species i, rif, increases when a 
randomly selected species-flavonoid pair includes species i. Thus, species with 
many flavonoids tend to be selected in such a case. As a result, such species 
acquire more flavonoids, implying a 'rich-get-richer' mechanism. The origin 
of this preferential mechanism is similar to that in the Dorogovtsev-Mendes- 
Samukhin (DMS) model (|16l ). However, our model is essentially different from 
the DMS model because the DMS model does not describe bipartite relation- 
ships. 

This is mathematically described as follows. We consider the time evolution 
of rij. Let Lit) be the total number of species-flavonoid pairs at time t; the 
probability that species % with flavonoids is chosen is equivalent to rfy/ L(t) 
because the pair is randomly selected. In addition, Event II occurs with the 
probability 1 — p. Therefore, the time evolution of n\ is described as 



d i 
alt f 



P) 



L(ty 



(i; 



Moreover, we focus on the time evolution of L(t). The number of pairs L(t) 
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increases in Events I and II. In the case of Event I, L(t) increases by q x 
L(t)/S(t), where S(t) is the number of species at time t, because flavonoids of 
the randomly selected existing species are inherited by the new species with 
the probability q. Note that the expected number of flavonoids of randomly 
selected species is jfj=in)/S{t) = L{t)/ S{t). In the case of Event II, L{t) 
increases by 1. The events I and II occur with the probabilities p and 1 — p, 
respectively. Therefore, the time evolution of L(t) is written as 

| i(t)=p ,H + (1 _ p) . (2) 



Since S{t) = pt, the solution of this equation with the initial condition L{1) 
L is 




l-q 



(3) 
(4) 



indicating that L(t) is approximately proportional to time t for large t and 
relatively small q. 



Substituting Eq. (BJ into Eq. (CQ), we have 

d i a \ n * 
dt n ^ {l - q) l 



^n>«(l-?A (5) 



suggesting the preferential mechanism: the increase of n\ is proportional to 



From this equation, using the mean- field-based method (1151 ). we immediately 
obtain the power- law distribution of nf. 

P(n f ) ~ n] (2 - q)/{1 - q) . (6) 



We next consider the number of shared species of a flavonoid n s . When Event 
I occurs, n s increases. The number of shared species of a flavonoid i, n\ , might 
increase when a randomly selected species has flavonoid i. Thus, flavonoids 
shared by many species tend to be selected in such As a result, such 

flavonoids are shared by more species, reflecting a 'rich-get-richer' mecha- 
nism. The origin of this preferential mechanism is analogous to that in the 



duplication-divergence (DD) model (1171 ; Il8l ). However, our model is also dif- 



ferent from the DD model in that the DD model does not describe bipartite 
relationships. 
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This is mathematically explained as follows. We consider the time evolution of 
n\. The probability that each flavonoid % is shared by a new species is equiv- 
alent because a resulting new species obtains flavonoid % with the probability 
q when one of n l s species with flavonoid i is randomly selected. In addition, 
Event I occurs with the probability p. Since S(t) = pt, therefore, the time 
evolution of n\ is described as 

d 71? 71? 



implying the preferential mechanism: the increase of n\ is proportional to n\ 
Like for n/, we immediately have the power-law distribution of n s : 

P{n s ) ~ n^ 1+q)/q . (8 



3.2.3 Comparison with real data 

We compare frequency distributions of the number of flavonoids of a plant 
species rif and the number of shared species of a flavonoid n s between our 
model and real data. We consider the frequency distributions obtained from 
the whole data set. 

The parameter p is derived from S/ (S + F). Since S = 4725 and F = 6846 in 
the actual species-flavonoid network, the parameter p becomes 0.41. 

Fig. H] shows the comparison of the frequency distribution between real data 
and the model with q = 0.26. The parameter q is selected by minimizing 
the distributional distance (the inset in Fig. Hj), which corresponds to the 



sum of tail-weighted Kolmogorov-Smirnov statistics (distances) (1 19 ) for two 
distributions [P(nf) and P(n s )} between the predicted distributions and the 
empirical distributions. Fig. shows the comparison of the network structure 
between real data and our model. Here, we chose the species-flavonoid network 
for Rutaceae (the citrus family) as an example because of its reasonable size. 

As shown in these figures, the model is in good agreement with real data, 
indicating that the two mechanisms in our model are a possible origin of 
the heterogeneous distribution. The power-law statistics are conserved among 
families as shown in Fig. [2J suggesting that the evolutionary mechanisms might 
be universal among families. 

In our model, the parameter q means the probability that flavonoid composi- 
tions of new species, which emerge due to mutations of an existing (ancestral) 
species, are inherited from ancestral species. As above, the inheritance prob- 
ability is relatively small (q = 0.26), suggesting high divergence of flavonoid 
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Fig. 4. Comparison of frequency distributions between real data and the model. 
The red squares represent the real frequency distribution of the number of fiavonoids 
in a plant species rif. The green circles indicate the real frequency distribution of 
the number of shared species of a flavonoid n s . The solid lines correspond to the 
predicted frequency distributions averaged over 100 realizations of the model with 
q = 0.26. The inset shows the distributional distance between predicted distributions 
and empirical distributions with the parameter q. 





Fig. 5. Comparison of network structure between real data for Rutaceae (A) and 
our model with q = 0.26 (B). Filled squares and open circles correspond to plant 
species and fiavonoids, respectively. The networks are drawn by yEd (0). 

composition from ancestral species. This might be because new species eventu- 
ally acquire adaptive compositions of fiavonoids different from those ancestral 
species in different environments in evolutionary history. In particular, species' 
habitats might be strongly reflected in the composition of secondary metabo- 
lites because such metabolites play crucial roles in survival in diverse envi- 
ronments. For this reason, flavonoid compositions are expected differ between 
new species and ancestral species. In this manner, our model can estimate the 
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divergence degree of flavonoids from metabolite distributions. 

In addition, our model might also be applied to wide-range metabolites among 
species because the emergence mechanisms, considered in the model, are pos- 
sibly also similar in the case of other metabolites such as other secondary 
metabolites (e.g. alkanoids, terpenoids, and phenolics) and lipids, which are 
important from the viewpoints of pharmacodynamics and dietetics. Thus, 
we can predict that heterogeneous distributions are observed in not only 
flavonoids but also other metabolites. Furthermore, our model also helps to 
estimate the divergence degree of such metabolites. 



3.3 Characteristics of hubs in species- flavonoid networks: the case of Fabaceae 



We here investigate hubs in species-flavonoid networks. In particular, hubs 
for flavonoid nodes might be key characteristics at higher levels because such 
flavonoids are well conserved among species at higher levels. As an example, 
we focus on Fabaceae (the bean family), already well investigated as the source 
of isoflavonoids. 

Hubs are defined using a Z-score that characterizes disagreement with an 



average (J20|; l2ll ). If network connectivity is determined at random like in a 
random graph, the degree distribution approximately follows a normal distri- 
bution. Then, we can find nodes with large numbers of edges, which are hardly 
in agreement with the normal distribution, using the Z-score. Thus, we define 
hubs as nodes with more than k + az c edges, where k is the average number 
of edges over all flavonoid nodes or species nodes and a is the standard devi- 
ation of k. z r is a threshold value used to determine hubs, and we set z r = 2.5 



mm. 



Fig. [6] shows the degree of specificity Si for each flavonoid class, defined as 
Si = Til E4 — 1 where rj is the ratio of flavonoid class % in a target flavonoid 
set, and Ri is the ratio of flavonoid class i in the whole data set. A positive Si 
indicates a discriminative flavonoid class i. 

As shown in this figure, there are significantly many isoflavonoids. It is well 



known that isoflavonoids are predominantly found in Fabaceae (1221 ). in good 
agreement with the result obtained from our analysis. The presence of dom- 
inant flavonoids is also predicted by our model. Species have many similar 
flavonoids due to the evolutionary mechanism: new flavonoids are generated 
by modification of existing (ancestral) flavonoids. This result also supports 
the reliability of our model. 

In particular, the significance of isoflavonoids is clearer in the case of hub 
flavonoids. Here, we can also observe discriminative anthocyanins. This class 
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Fig. 6. Degree of specificity of each fiavonoid class. The case of all fiavonoids 
[the item "Fabaceae (whole)"] and the case of hub fiavonoids [the item "Fabaceae 
(Hub)"] are shown. 



of fiavonoid serves as pigments in plants, and some are well conserved across 
many species. In Fabaceae, thus, we also expect to characteristically observe 
anthocyanins as typified by chrysanthemin in black beans. In the case of whole 
fiavonoids, however, anthocyanins are relatively less conspicuous, unlike the 
case for hub fiavonoids. 



As explained in the previous section, we have possibilities of extracting over- 
rated and/or underrated information using whole fiavonoids due to the het- 
erogeneous distribution. To detect higher-level-specific (e.g. family-specific) 
characters based on whole fiavonoids, we need to assume the number of shared 
species of a fiavonoid with a normal (homogeneous) distribution. However, the 
statistics follow a power-law distribution as shown in Fig. |2]B, indicating that 
the assumption is not appropriate. Therefore, we might extract more appro- 
priate information at higher levels from hub (well-shared) fiavonoids rather 
than whole fiavonoids. 



As an example, let us show hubs of isoflavonoids significantly more distributed 
within Fabaceae. Table [1] shows the list of the top 20 isoflavonoids in Fabaceae 
ranked by the number of shared species. As shown in this table, the hub 
fiavonoids are shared among species of diverse types. In hub fiavonoids, more- 
over, we fin d g enistein and daizein, which are believed to be origins of all 
isoflavones (|22l ) . Most of the other hub fiavonoids are synthesized via simple 
modifications of genistein and daizein. This result might support a hypothesis 
from our model: hub (well-shared) fiavonoids are ancient. 



Our model has an assumption: new fiavonoids are generated by modification 
of existing (ancestral) fiavonoids. If this assumption is appropriate, we can 
expect hub (well-shared) fiavonoids to have relatively simple modifications. As 
explained in the previous section, fiavonoids consist of backbone structures and 
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Table 1 

List of the top 20 isoflavonoids in Fabaceae ranked by the number of shared species. 





Flavonoid 


# Shared species 


Shared species 


■ 


Tephrosin 


11 


Amorpka JruHcosa, Crotalaria spp., Den is spp., Lonehocarpus langifottus, 
Lonchocarpus spp., Lonehocarpus spruceanus, MiUettia dura, MiUettia ferruginea, 
Piscidia mollois, Tephrosia elata, Tephrosia spp. 




Genistein 


11 


Adenocarpus decorticans, Adenocarpus foliolosus , Calicotome spinosa, Calicotome 
villosa, Chamaecytisus spp., Cytisus spp., Desmodium uncinatum, Genista spp., 
Tri folium pratense, Tri folium spp., U lex spp. 




Pseudobaptigenin 


10 


Baptisia spp., Baptisia tinctoria, Dalbergia assamica, Dalbergia latifolia, Dalbergia 
sericea, Dalbergia spp., Dalbergia spruceana, Dalbergia stevensonii , Maackia spp., 
Pterocarpus spp. 




(-)-Medicarpin 


9 


Andira inermis, Dalbergia spp., Dalbergia variabilis, Gliricidia sepium, Lathyrus 
spp., Maackia amurensis, Medicago spp., Trifolium spp., Trigonella spp. 


■ 


Afroimosin 


9 


Afrormosia elata, Baptisia australis, Castanospermum australe , Centrosema spp., 
Gliricidia sepium, Myrocarpus fasti giatus , Myroxylon balsamum, Onobrychis 
viciifolia, Periconsis elata 


9-O-Methylcoumestrol 
Coumestrol 


8 
8 


Cicer arietinum, Dalbergia oliveri, Dalbergia stevensonii, Medicago sativa, 
Medicago spp., Myroxvlon balsamum , Trifolium pratense, Trifolium repens 
Glycine max, Medicago spp., Phaseolus lunatus, Phaseolus spp., Phaseolus vulgaris, 
Pis urn sativum, Trifolium spp., Vigna unguiculata 




(-)-Maackiain 


8 


Cicer arietinum, Cicer spp., Lathyrus spp., Maackia amurensis, Sophora japonica, 
Sophora tetraptera, Trifolium spp., Trigonella spp. 


(-)-Maackiain 3-O-gIucoside 


8 


Baptisia australis, Cicer spp., Euchresta japonica, Ononis spinosa, Sophora 
subprostrata , Tephrosia spp ., Thermopsis spp., Trifolium pratense 


2'-Hydroxygeni stein 


8 


Cajanus cajan, Dolichos biflorus, Laburnum anagyroides, Lupinus angustif alius , 
Moi^hania mucrophvlla , Phaseolus vulgaris , Spartiuni junceum, Vigna annularis 




Orobol 


8 


Baptisia spp., Bolusanthus speciosus, Cytisus scoparius, Lathyrus montanus, 
Lathyrus nissolia, Lathyrus spp., Maackia amurensis , Thermopsis spp. 




Formononetin 


8 


Baptisia spp., Cicer arietinum, Cicer spp., Dalbergia baroni, Genista spp., Gliricidia 
sepium, Trifolium pratense , Trifolium spp. 


■ 


Rotenone 


7 


Derris elliptica, Derris spp., Derris trifoliata, Lonchocarpus spp., MiUettia spp., 
Piscidia erythrina, Tephrosia spp. 


Demethylmedicarpin 
5-O-Methylgenistein 


7 
7 


Erythrina crista-galli, Erythrina poeppigiana, Erythrina sandwicensis, Melilotus 
alba, Pachyrrhizus erosus, Psophocarpus tetragonolobus , Trifolium repens 
Adenocarpus decorticans, Adenocarpus foliolosus, Calicotome spinosa, Calicotome 
villosa, Chamaecytisus hirsutus, Chamaecytisus supinus, Cytisus spp. 




Biochanin A 


7 


Cicer spp., Dalbergia spp., Pericopsis spp., Swartzia polyphylla, Thermopsis spp., 
Trifolium pratense, Trifolium spp. 


■ 


Wighteone 


7 


Argyrocytisus battandieri, Erythrina variegata, Laburnum anagyroides, Lupinus 
albus, Lupinus angustifolius, Lupinus polyphyllus, Neonotonia wightii 




Daidzein 


7 


Cajanus cajan, Dalbergia ecastaphyllum, Erythrina crista-galli, Lespedeza bicolor, 
Trifolium pratense , Trifolium repens, Ulex europaeus 


■ 


Dehydrorotenone 


6 


Derris spp., Lonchocarpus longifolius, MiUettia pachycarpa, Neorautanenia 
amboensis, Tephrosia falciformis, Tephrosia virginiana 




(-)-cis-Deguelin 


6 


Derris elliptica, Derris trifoliata, Lonchocarpus spp., MiUettia spp., Piscidia spp., 
Tephrosia spp. 



their modifications. Thus, we can estimate the complexity of modification for 
flavonoids using their masses in the case where two flavonoids share backbone 
structures. In order to test this hypothesis, we investigate correlations between 
masses of flavonoids and the number of shared species n s (Fig. [7]). As an 
example, we here focus on backbone structures of two types for isoflavonoids: 
isoflavone and isoflavan (highly observed in Fabaceae). As shown in this figure, 
there are negative correlations, implying relatively simple modification of hub 
flavonoids. This result suggests validity of the assumption in the model. In 
addition, this result also suggests that we can predict the number of shared 
species of a flavonoid using its mass. This might be useful for detecting well- 
shared flavonoids. 



4 Conclusion 

We have found heterogeneous distributions of flavonoids among species, and 
this is conserved among several families: fat-tailed distributions of the number 
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t (B) Isoflavan 

o 

2 4 6 8 10 12 01234567 

# Shared species of a flavonoid # Shared species of a flavonoid 

Fig. 7. Correlations between masses of flavonoids and the number of shared species 
n s . (A) Isoflavone (Spearman's rank correlation r = —0.23 with P < 1CP 5 ). (B) 
Isoflavan (Spearman's rank correlation r = —0.33 with P < 0.01) 

of flavonoids in a species and the number of shared species of a flavonoid. In 
particular, we can extract more appropriate and beneficial flavonoid composi- 
tions with consideration for the heterogeneous distribution. This finding might 
be useful for taxonomic classification and characterization of living organisms 
using secondary metabolites including flavonoids at higher levels. Further- 
more, a simple model has been proposed for describing a possible origin of 
the heterogeneous distribution. It has been shown that the 'rich-get-richer' 
mechanisms inducing heterogeneous distributions are led by simple evolu- 
tionary mechanisms. The model estimates the divergence degree of several 
metabolites including flavonoids, and it predicts heterogeneous distribution of 
such metabolites among species. We furthermore have found relatively simple 
modifications of well-shared flavonoids via the model. Our model helps with 
understanding the evolution of metabolite diversity. 
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